当前,随机化是用于机器人技术中数据驱动的学习算法的SIM2REAL传输中广泛使用的方法。尽管如此,大多数SIM2REAL研究报告了特定随机技术的结果,并且通常是在高度定制的机器人系统上,因此很难系统地评估不同的随机方法。为了解决这个问题,我们为机器人触及余量操纵器任务定义了易于制作的实验设置,该设置可以作为比较的基准。我们将四个随机策略与模拟和真实机器人中的三个随机参数进行比较。我们的结果表明,更多的随机化有助于SIM2REAL转移,但它也可能损害算法在模拟中找到良好策略的能力。完全随机的仿真和微调显示出差异化的结果,并且比测试的其他方法更好地转化为实际机器人。
translated by 谷歌翻译
人类的物体感知能力令人印象深刻,当试图开发具有类似机器人的解决方案时,这变得更加明显。从人类如何将视觉和触觉用于对象感知和相关任务的灵感中,本文总结了机器人应用的多模式对象感知的当前状态。它涵盖了生物学灵感,传感器技术,数据集以及用于对象识别和掌握的感觉数据处理的各个方面。首先,概述了多模式对象感知的生物学基础。然后讨论了传感技术和数据收集策略。接下来,介绍了主要计算方面的介绍,突出显示了每个主要应用领域的一些代表性文章,包括对象识别,传输学习以及对象操纵和掌握。最后,在每个领域的当前进步中,本文概述了有希望的新研究指示。
translated by 谷歌翻译
Developed as a solution to a practical need, active learning (AL) methods aim to reduce label complexity and the annotations costs in supervised learning. While recent work has demonstrated the benefit of using AL in combination with large pre-trained language models (PLMs), it has often overlooked the practical challenges that hinder the feasibility of AL in realistic settings. We address these challenges by leveraging representation smoothness analysis to improve the effectiveness of AL. We develop an early stopping technique that does not require a validation set -- often unavailable in realistic AL settings -- and observe significant improvements across multiple datasets and AL methods. Additionally, we find that task adaptation improves AL, whereas standard short fine-tuning in AL does not provide improvements over random sampling. Our work establishes the usefulness of representation smoothness analysis in AL and presents an AL stopping criterion that reduces label complexity.
translated by 谷歌翻译
Deep supervised models have an unprecedented capacity to absorb large quantities of training data. Hence, training on multiple datasets becomes a method of choice towards strong generalization in usual scenes and graceful performance degradation in edge cases. Unfortunately, different datasets often have incompatible labels. For instance, the Cityscapes road class subsumes all driving surfaces, while Vistas defines separate classes for road markings, manholes etc. Furthermore, many datasets have overlapping labels. For instance, pickups are labeled as trucks in VIPER, cars in Vistas, and vans in ADE20k. We address this challenge by considering labels as unions of universal visual concepts. This allows seamless and principled learning on multi-domain dataset collections without requiring any relabeling effort. Our method achieves competitive within-dataset and cross-dataset generalization, as well as ability to learn visual concepts which are not separately labeled in any of the training datasets. Experiments reveal competitive or state-of-the-art performance on two multi-domain dataset collections and on the WildDash 2 benchmark.
translated by 谷歌翻译
Current state-of-the-art approaches to text classification typically leverage BERT-style Transformer models with a softmax classifier, jointly fine-tuned to predict class labels of a target task. In this paper, we instead propose an alternative training objective in which we learn task-specific embeddings of text: our proposed objective learns embeddings such that all texts that share the same target class label should be close together in the embedding space, while all others should be far apart. This allows us to replace the softmax classifier with a more interpretable k-nearest-neighbor classification approach. In a series of experiments, we show that this yields a number of interesting benefits: (1) The resulting order induced by distances in the embedding space can be used to directly explain classification decisions. (2) This facilitates qualitative inspection of the training data, helping us to better understand the problem space and identify labelling quality issues. (3) The learned distances to some degree generalize to unseen classes, allowing us to incrementally add new classes without retraining the model. We present extensive experiments which show that the benefits of ante-hoc explainability and incremental learning come at no cost in overall classification accuracy, thus pointing to practical applicability of our proposed approach.
translated by 谷歌翻译
In today's data-driven society, supervised machine learning is rapidly evolving, and the need for labeled data is increasing. However, the process of acquiring labels is often expensive and tedious. For this reason, we developed ALANNO, an open-source annotation system for NLP tasks powered by active learning. We focus on the practical challenges in deploying active learning systems and try to find solutions to make active learning effective in real-world applications. We support the system with a wealth of active learning methods and underlying machine learning models. In addition, we leave open the possibility to add new methods, which makes the platform useful for both high-quality data annotation and research purposes.
translated by 谷歌翻译
转移学习是用于训练小型目标数据集深层网络的主要范式。通常在大型``上游''数据集上预估计用于分类的模型,因为此类标签易于收集,然后在``下游''任务(例如动作本地化)上进行了填充,这些任务由于其较细粒度的注释而较小。在本文中,我们质疑这种方法,并提出共同访问 - 同时在多个``上游''和``下游''任务上训练单个模型。我们证明,在使用相同的数据总量时,共同传统的表现优于传统的转移学习,并且还展示了我们如何轻松地将方法扩展到多个``上游''数据集以进一步提高性能。尤其是,共同访问可以显着提高我们下游任务中稀有类别的性能,因为它具有正规化的效果,并使网络能够学习在不同数据集之间传输的功能表示。最后,我们观察到如何与公共,视频分类数据集共同进行,我们能够在挑战性的AVA和AVA-Kinetics数据集上实现最新的时空动作的结果,超过了最新的作品,这些作品的最新作品会发展出复杂的作品楷模。
translated by 谷歌翻译
深度监督模型具有前所未有的能力来吸收大量培训数据。因此,许多数据集的培训成为一种在不寻常场景中优雅地降级的方法。不幸的是,不同的数据集通常使用不兼容的标签。例如,CityScapes Road类归入所有驱动表面,而Vistas定义了道路标记,人孔等的单独课程。我们通过提出基于部分标签和概率损失的重叠类的数据集的无缝学习方法来解决这一挑战。我们的方法在数据集中竞争和交叉数据集泛化中实现了竞争力,以及学习在任何训练数据集中不单独标记的视觉概念的能力。实验揭示了两个多域数据集集合和野外竞争性能的竞争性或最先进的性能。
translated by 谷歌翻译
对不确定度和鲁棒性的高质量估计对于众多现实世界的应用来说至关重要,特别是对于深入学习,这是利用许多部署的ML系统。因此,比较改善这些估计的技术的能力对于研究和实践相似非常重要。然而,由于一系列原因,通常缺乏方法的竞争比较,包括:计算广泛调整的可用性,加入足够多的基线,以及用于再现性的具体文件。在本文中,我们介绍了不确定性的基线:在各种任务中的标准和最先进的深度学习方法的高质量实现。从本撰写中,集合跨越9项方法,每个方法都有至少5个度量。每个基线都是一个独立的实验管道,易于可重复使用和可伸缩的部件。我们的目标是提供具有新方法或应用的实验的即时出发点。此外,我们还提供模型检查点,实验输出为Python笔记本,以及用于比较结果的排行榜。代码在https://github.com/google/uncertainty-baselines。
translated by 谷歌翻译
密集的语义预测通过推断未观察到的未来图像的像素级语义来预测视频中的未来事件。我们提出了一种适用于各种单帧架构和任务的新方法。我们的方法包括两个模块。功能 - 动作(F2M)模块预测了密集的变形领域,将过去的功能扭曲到其未来的位置。功能到特征(F2F)模块直接回归未来功能,因此能够考虑紧急风景。化合物F2MF模型以任务 - 不可行的方式与新奇效果的运动效果脱钩。我们的目标是将F2MF预测应用于所需单帧模型的最自述和最抽象的最摘要表示。我们的设计利用了相邻时间瞬间可变形卷曲和空间相关系数。我们在三个密集预测任务中执行实验:语义分割,实例级分割和Panoptic分割。结果介绍了三个密集预测任务的最先进的预测精度。
translated by 谷歌翻译